{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data - an introduction to the world of Pandas\n",
    "\n",
    "**Note:** This is an edited version of [Cliburn Chan's](http://people.duke.edu/~ccc14/sta-663-2017/07_Data.html) original tutorial, as part of his Stat-663 course at Duke.  All changes remain licensed as the original, under the terms of the MIT license.\n",
    "\n",
    "Additionally, sections have been merged from [Chris Fonnesbeck's Pandas tutorial from the NGCM Summer Academy](https://github.com/fonnesbeck/ngcm_pandas_2017/blob/master/notebooks/1.%20Introduction%20to%20NumPy%20and%20Pandas.ipynb), which are licensed under [CC0 terms](https://creativecommons.org/share-your-work/public-domain/cc0) (aka 'public domain')."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resources\n",
    "\n",
    "- [The Introduction to Pandas chapter](http://proquest.safaribooksonline.com/9781491957653/pandas_html) in the Python for Data Analysis book by Wes McKinney is essential reading for this topic.  This is the [companion notebook](https://github.com/wesm/pydata-book/blob/2nd-edition/ch05.ipynb) for that chapter.\n",
    "- [Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/)\n",
    "\n",
    "\n",
    "## Pandas\n",
    "\n",
    "**pandas** is a Python package providing fast, flexible, and expressive data structures designed to work with *relational* or *labeled* data both. It is a fundamental high-level building block for doing practical, real world data analysis in Python. \n",
    "\n",
    "pandas is well suited for:\n",
    "\n",
    "- **Tabular** data with heterogeneously-typed columns, as you might find in an SQL table or Excel spreadsheet\n",
    "- Ordered and unordered (not necessarily fixed-frequency) **time series** data.\n",
    "- Arbitrary **matrix** data with row and column labels\n",
    "\n",
    "Virtually any statistical dataset, labeled or unlabeled, can be converted to a pandas data structure for cleaning, transformation, and analysis.\n",
    "\n",
    "\n",
    "### Key features\n",
    "    \n",
    "- Easy handling of **missing data**\n",
    "- **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects\n",
    "- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically\n",
    "- Powerful, flexible **group by functionality** to perform split-apply-combine operations on data sets\n",
    "- Intelligent label-based **slicing, fancy indexing, and subsetting** of large data sets\n",
    "- Intuitive **merging and joining** data sets\n",
    "- Flexible **reshaping and pivoting** of data sets\n",
    "- **Hierarchical labeling** of axes\n",
    "- Robust **IO tools** for loading data from flat files, Excel files, databases, and HDF5\n",
    "- **Time series functionality**: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from pandas import Series, DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.style.use('seaborn-dark')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with Series\n",
    "\n",
    "* A pandas Series is a generationalization of 1d numpy array\n",
    "* A series has an *index* that labels each element in the vector.\n",
    "* A `Series` can be thought of as an ordered key-value store."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 6, 7, 8, 9])"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.array(range(5,10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = Series(range(5,10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    5\n",
       "1    6\n",
       "2    7\n",
       "3    8\n",
       "4    9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We can treat Series objects much like numpy vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(35, 7.0, 1.5811388300841898)"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.sum(), x.mean(), x.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    25\n",
       "1    36\n",
       "2    49\n",
       "3    64\n",
       "4    81\n",
       "dtype: int64"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3    8\n",
       "4    9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[x >= 8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series can also contain more information than numpy vectors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### You can always use standard positional indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    6\n",
       "2    7\n",
       "3    8\n",
       "dtype: int64"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[1:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Series index\n",
    "\n",
    "But you can also assign labeled indexes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    5\n",
       "b    6\n",
       "c    7\n",
       "d    8\n",
       "e    9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.index = list('abcde')\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Note that with labels, the end index is included"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    6\n",
       "c    7\n",
       "d    8\n",
       "dtype: int64"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x['b':'d']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Even when you have a labeled index, positional arguments still work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    6\n",
       "c    7\n",
       "d    8\n",
       "dtype: int64"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[1:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Working with missing data\n",
    "\n",
    "Missing data is indicated with NaN (not a number)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    10.0\n",
       "1     NaN\n",
       "2     NaN\n",
       "3    13.0\n",
       "4    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = Series([10, np.nan, np.nan, 13, 14])\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Concatenating two series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a     5.0\n",
       "b     6.0\n",
       "c     7.0\n",
       "d     8.0\n",
       "e     9.0\n",
       "0    10.0\n",
       "1     NaN\n",
       "2     NaN\n",
       "3    13.0\n",
       "4    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z = pd.concat([x, y])\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reset index to default"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "6     NaN\n",
       "7     NaN\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z = z.reset_index(drop=True)\n",
    "z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     25.0\n",
       "1     36.0\n",
       "2     49.0\n",
       "3     64.0\n",
       "4     81.0\n",
       "5    100.0\n",
       "6      NaN\n",
       "7      NaN\n",
       "8    169.0\n",
       "9    196.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z**2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `pandas` aggregate functions ignore missing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(72.0, 9.0, 3.2071349029490928)"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.sum(), z.mean(), z.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Selecting missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6   NaN\n",
       "7   NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z[z.isnull()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Selecting non-missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z[z.notnull()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Replacement of missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "6     0.0\n",
       "7     0.0\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "6    10.0\n",
       "7    10.0\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.fillna(method='ffill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "6    13.0\n",
       "7    13.0\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.fillna(method='bfill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.0\n",
       "1     6.0\n",
       "2     7.0\n",
       "3     8.0\n",
       "4     9.0\n",
       "5    10.0\n",
       "6     9.0\n",
       "7     9.0\n",
       "8    13.0\n",
       "9    14.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.fillna(z.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Working with dates / times\n",
    "\n",
    "We will see more date/time handling in the DataFrame section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.index = pd.date_range('01-Jan-2016', periods=len(z))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2016-01-01     5.0\n",
       "2016-01-02     6.0\n",
       "2016-01-03     7.0\n",
       "2016-01-04     8.0\n",
       "2016-01-05     9.0\n",
       "2016-01-06    10.0\n",
       "2016-01-07     NaN\n",
       "2016-01-08     NaN\n",
       "2016-01-09    13.0\n",
       "2016-01-10    14.0\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Intelligent aggregation over datetime ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2016-01-03    18.0\n",
       "2016-01-10    54.0\n",
       "Freq: W-SUN, dtype: float64"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.resample('W').sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Formatting datetime objects (see http://strftime.org)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Jan 01, 2016', 'Jan 02, 2016', 'Jan 03, 2016', 'Jan 04, 2016',\n",
       "       'Jan 05, 2016', 'Jan 06, 2016', 'Jan 07, 2016', 'Jan 08, 2016',\n",
       "       'Jan 09, 2016', 'Jan 10, 2016'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.index.strftime('%b %d, %Y')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrames\n",
    "\n",
    "Inevitably, we want to be able to store, view and manipulate data that is *multivariate*, where for every index there are multiple fields or columns of data, often of varying data type.\n",
    "\n",
    "A `DataFrame` is a tabular data structure, encapsulating multiple series like columns in a spreadsheet.  It is directly inspired by the R DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Titanic data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv'\n",
    "titanic = pd.read_csv(url)\n",
    "titanic.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(891, 15)"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13365"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',\n",
       "       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',\n",
       "       'alive', 'alone'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For display purposes, we will drop some columns\n",
    "titanic = titanic[['survived', 'sex', 'age', 'fare',\n",
    "                   'embarked', 'class', 'who', 'deck', 'embark_town',]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "survived         int64\n",
       "sex             object\n",
       "age            float64\n",
       "fare           float64\n",
       "embarked        object\n",
       "class           object\n",
       "who             object\n",
       "deck            object\n",
       "embark_town     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summarizing a data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>count</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>714.000000</td>\n",
       "      <td>891.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>mean</td>\n",
       "      <td>0.383838</td>\n",
       "      <td>29.699118</td>\n",
       "      <td>32.204208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>std</td>\n",
       "      <td>0.486592</td>\n",
       "      <td>14.526497</td>\n",
       "      <td>49.693429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>min</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.420000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25%</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>20.125000</td>\n",
       "      <td>7.910400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>50%</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>28.000000</td>\n",
       "      <td>14.454200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>75%</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>max</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>512.329200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         survived         age        fare\n",
       "count  891.000000  714.000000  891.000000\n",
       "mean     0.383838   29.699118   32.204208\n",
       "std      0.486592   14.526497   49.693429\n",
       "min      0.000000    0.420000    0.000000\n",
       "25%      0.000000   20.125000    7.910400\n",
       "50%      0.000000   28.000000   14.454200\n",
       "75%      1.000000   38.000000   31.000000\n",
       "max      1.000000   80.000000  512.329200"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8.4583</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>E</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>11.1333</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>C</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>4.0</td>\n",
       "      <td>16.7000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>G</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>20.0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>39.0</td>\n",
       "      <td>31.2750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>7.8542</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>55.0</td>\n",
       "      <td>16.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>29.1250</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>31.0</td>\n",
       "      <td>18.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.2250</td>\n",
       "      <td>C</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    survived     sex   age     fare embarked   class    who deck  embark_town\n",
       "0          0    male  22.0   7.2500        S   Third    man  NaN  Southampton\n",
       "1          1  female  38.0  71.2833        C   First  woman    C    Cherbourg\n",
       "2          1  female  26.0   7.9250        S   Third  woman  NaN  Southampton\n",
       "3          1  female  35.0  53.1000        S   First  woman    C  Southampton\n",
       "4          0    male  35.0   8.0500        S   Third    man  NaN  Southampton\n",
       "5          0    male   NaN   8.4583        Q   Third    man  NaN   Queenstown\n",
       "6          0    male  54.0  51.8625        S   First    man    E  Southampton\n",
       "7          0    male   2.0  21.0750        S   Third  child  NaN  Southampton\n",
       "8          1  female  27.0  11.1333        S   Third  woman  NaN  Southampton\n",
       "9          1  female  14.0  30.0708        C  Second  child  NaN    Cherbourg\n",
       "10         1  female   4.0  16.7000        S   Third  child    G  Southampton\n",
       "11         1  female  58.0  26.5500        S   First  woman    C  Southampton\n",
       "12         0    male  20.0   8.0500        S   Third    man  NaN  Southampton\n",
       "13         0    male  39.0  31.2750        S   Third    man  NaN  Southampton\n",
       "14         0  female  14.0   7.8542        S   Third  child  NaN  Southampton\n",
       "15         1  female  55.0  16.0000        S  Second  woman  NaN  Southampton\n",
       "16         0    male   2.0  29.1250        Q   Third  child  NaN   Queenstown\n",
       "17         1    male   NaN  13.0000        S  Second    man  NaN  Southampton\n",
       "18         0  female  31.0  18.0000        S   Third  woman  NaN  Southampton\n",
       "19         1  female   NaN   7.2250        C   Third  woman  NaN    Cherbourg"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.head(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>886</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>887</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>30.00</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>B</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>888</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23.45</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>889</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>30.00</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>890</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>7.75</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived     sex   age   fare embarked   class    who deck  embark_town\n",
       "886         0    male  27.0  13.00        S  Second    man  NaN  Southampton\n",
       "887         1  female  19.0  30.00        S   First  woman    B  Southampton\n",
       "888         0  female   NaN  23.45        S   Third  woman  NaN  Southampton\n",
       "889         1    male  26.0  30.00        C   First    man    C    Cherbourg\n",
       "890         0    male  32.0   7.75        Q   Third    man  NaN   Queenstown"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.tail(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['survived', 'sex', 'age', 'fare', 'embarked', 'class', 'who', 'deck',\n",
       "       'embark_town'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=891, step=1)"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing\n",
    "\n",
    "The default indexing mode for dataframes with `df[X]` is to access the DataFrame's *columns*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Third</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>First</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>Third</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>First</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>Third</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      sex   age  class\n",
       "0    male  22.0  Third\n",
       "1  female  38.0  First\n",
       "2  female  26.0  Third\n",
       "3  female  35.0  First\n",
       "4    male  35.0  Third"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic[['sex', 'age', 'class']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the `iloc` helper for indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived     sex   age     fare embarked  class    who deck  embark_town\n",
       "0         0    male  22.0   7.2500        S  Third    man  NaN  Southampton\n",
       "1         1  female  38.0  71.2833        C  First  woman    C    Cherbourg\n",
       "2         1  female  26.0   7.9250        S  Third  woman  NaN  Southampton"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.indexing._iLocIndexer at 0x7f2cef1bb9b0>"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "survived                 0\n",
       "sex                   male\n",
       "age                     22\n",
       "fare                  7.25\n",
       "embarked                 S\n",
       "class                Third\n",
       "who                    man\n",
       "deck                   NaN\n",
       "embark_town    Southampton\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived     sex   age     fare embarked  class    who deck  embark_town\n",
       "0         0    male  22.0   7.2500        S  Third    man  NaN  Southampton\n",
       "1         1  female  38.0  71.2833        C  First  woman    C    Cherbourg\n",
       "2         1  female  26.0   7.9250        S  Third  woman  NaN  Southampton\n",
       "3         1  female  35.0  53.1000        S  First  woman    C  Southampton\n",
       "4         0    male  35.0   8.0500        S  Third    man  NaN  Southampton"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc[0:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>4.0</td>\n",
       "      <td>16.7000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>G</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8.4583</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    survived     sex   age     fare embarked  class    who deck  embark_town\n",
       "0          0    male  22.0   7.2500        S  Third    man  NaN  Southampton\n",
       "10         1  female   4.0  16.7000        S  Third  child    G  Southampton\n",
       "1          1  female  38.0  71.2833        C  First  woman    C    Cherbourg\n",
       "5          0    male   NaN   8.4583        Q  Third    man  NaN   Queenstown"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc[ [0, 10, 1, 5] ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10     4.0\n",
       "11    58.0\n",
       "12    20.0\n",
       "13    39.0\n",
       "14    14.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc[10:15]['age']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>39.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     age\n",
       "10   4.0\n",
       "11  58.0\n",
       "12  20.0\n",
       "13  39.0\n",
       "14  14.0"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.iloc[10:15][  ['age'] ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Sorting and ordering data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived     sex   age     fare embarked  class    who deck  embark_town\n",
       "0         0    male  22.0   7.2500        S  Third    man  NaN  Southampton\n",
       "1         1  female  38.0  71.2833        C  First  woman    C    Cherbourg\n",
       "2         1  female  26.0   7.9250        S  Third  woman  NaN  Southampton\n",
       "3         1  female  35.0  53.1000        S  First  woman    C  Southampton\n",
       "4         0    male  35.0   8.0500        S  Third    man  NaN  Southampton"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `sort_index` method is designed to sort a DataFrame by either its index or its columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>890</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>7.75</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>889</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>30.00</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>888</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23.45</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>887</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>30.00</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>B</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>886</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived     sex   age   fare embarked   class    who deck  embark_town\n",
       "890         0    male  32.0   7.75        Q   Third    man  NaN   Queenstown\n",
       "889         1    male  26.0  30.00        C   First    man    C    Cherbourg\n",
       "888         0  female   NaN  23.45        S   Third  woman  NaN  Southampton\n",
       "887         1  female  19.0  30.00        S   First  woman    B  Southampton\n",
       "886         0    male  27.0  13.00        S  Second    man  NaN  Southampton"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.sort_index(ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the Titanic index is already sorted, it's easier to illustrate how to use it for the index with a small test DF:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>100</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>234</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>150</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A\n",
       "100  1\n",
       "29   2\n",
       "234  3\n",
       "1    4\n",
       "150  5"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>100</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>150</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>234</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A\n",
       "1    4\n",
       "29   2\n",
       "100  1\n",
       "150  5\n",
       "234  3"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index() # same as df.sort_index('index')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas also makes it easy to sort on the *values* of the DF:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>803</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>C</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>755</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>644</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>C</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>469</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>C</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>78</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived     sex   age     fare embarked   class    who deck  embark_town\n",
       "803         1    male  0.42   8.5167        C   Third  child  NaN    Cherbourg\n",
       "755         1    male  0.67  14.5000        S  Second  child  NaN  Southampton\n",
       "644         1  female  0.75  19.2583        C   Third  child  NaN    Cherbourg\n",
       "469         1  female  0.75  19.2583        C   Third  child  NaN    Cherbourg\n",
       "78          1    male  0.83  29.0000        S  Second  child  NaN  Southampton"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.sort_values('age', ascending=True).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can sort on more than one column in a single call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>164</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>1.0</td>\n",
       "      <td>39.6875</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>386</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>1.0</td>\n",
       "      <td>46.9000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>29.1250</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>119</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>2.0</td>\n",
       "      <td>31.2750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived     sex  age     fare embarked  class    who deck  embark_town\n",
       "164         0    male  1.0  39.6875        S  Third  child  NaN  Southampton\n",
       "386         0    male  1.0  46.9000        S  Third  child  NaN  Southampton\n",
       "7           0    male  2.0  21.0750        S  Third  child  NaN  Southampton\n",
       "16          0    male  2.0  29.1250        Q  Third  child  NaN   Queenstown\n",
       "119         0  female  2.0  31.2750        S  Third  child  NaN  Southampton"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.sort_values(['survived', 'age'], ascending=[True, True]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note:* both the index and the columns can be named:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>851</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>74.0</td>\n",
       "      <td>7.7750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>96</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>71.0</td>\n",
       "      <td>34.6542</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>A</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>493</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>71.0</td>\n",
       "      <td>49.5042</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>116</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>70.5</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>672</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>70.0</td>\n",
       "      <td>10.5000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived   sex   age     fare embarked   class  who deck  \\\n",
       "id                                                                     \n",
       "851                0  male  74.0   7.7750        S   Third  man  NaN   \n",
       "96                 0  male  71.0  34.6542        C   First  man    A   \n",
       "493                0  male  71.0  49.5042        C   First  man  NaN   \n",
       "116                0  male  70.5   7.7500        Q   Third  man  NaN   \n",
       "672                0  male  70.0  10.5000        S  Second  man  NaN   \n",
       "\n",
       "attributes  embark_town  \n",
       "id                       \n",
       "851         Southampton  \n",
       "96            Cherbourg  \n",
       "493           Cherbourg  \n",
       "116          Queenstown  \n",
       "672         Southampton  "
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t = titanic.sort_values(['survived', 'age'], ascending=[True, False])\n",
    "t.index.name = 'id'\n",
    "t.columns.name = 'attributes'\n",
    "t.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Grouping data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [],
   "source": [
    "sex_class = titanic.groupby(['sex', 'class'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is a GroubBy object?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f2cef14e410>"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('female', 'First') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived     sex   age     fare embarked  class    who deck  \\\n",
       "1                  1  female  38.0  71.2833        C  First  woman    C   \n",
       "3                  1  female  35.0  53.1000        S  First  woman    C   \n",
       "\n",
       "attributes  embark_town  \n",
       "1             Cherbourg  \n",
       "3           Southampton  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('female', 'Second') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>C</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>55.0</td>\n",
       "      <td>16.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived     sex   age     fare embarked   class    who deck  \\\n",
       "9                  1  female  14.0  30.0708        C  Second  child  NaN   \n",
       "15                 1  female  55.0  16.0000        S  Second  woman  NaN   \n",
       "\n",
       "attributes  embark_town  \n",
       "9             Cherbourg  \n",
       "15          Southampton  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('female', 'Third') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>11.1333</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived     sex   age     fare embarked  class    who deck  \\\n",
       "2                  1  female  26.0   7.9250        S  Third  woman  NaN   \n",
       "8                  1  female  27.0  11.1333        S  Third  woman  NaN   \n",
       "\n",
       "attributes  embark_town  \n",
       "2           Southampton  \n",
       "8           Southampton  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('male', 'First') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>E</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>28.0</td>\n",
       "      <td>35.5000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>A</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived   sex   age     fare embarked  class  who deck  \\\n",
       "6                  0  male  54.0  51.8625        S  First  man    E   \n",
       "23                 1  male  28.0  35.5000        S  First  man    A   \n",
       "\n",
       "attributes  embark_town  \n",
       "6           Southampton  \n",
       "23          Southampton  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('male', 'Second') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived   sex   age  fare embarked   class  who deck  embark_town\n",
       "17                 1  male   NaN  13.0        S  Second  man  NaN  Southampton\n",
       "20                 0  male  35.0  26.0        S  Second  man  NaN  Southampton"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: ('male', 'Third') \n",
      "group:\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>7.25</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8.05</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived   sex   age  fare embarked  class  who deck  embark_town\n",
       "0                  0  male  22.0  7.25        S  Third  man  NaN  Southampton\n",
       "4                  0  male  35.0  8.05        S  Third  man  NaN  Southampton"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display\n",
    "\n",
    "for name, group in sex_class:\n",
    "    print('name:', name, '\\ngroup:\\n')\n",
    "    display(group.head(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>C</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>55.0</td>\n",
       "      <td>16.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>41</td>\n",
       "      <td>0</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>21.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>43</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>3.0</td>\n",
       "      <td>41.5792</td>\n",
       "      <td>C</td>\n",
       "      <td>Second</td>\n",
       "      <td>child</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cherbourg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>53</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>29.0</td>\n",
       "      <td>26.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>woman</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes  survived     sex   age     fare embarked   class    who deck  \\\n",
       "9                  1  female  14.0  30.0708        C  Second  child  NaN   \n",
       "15                 1  female  55.0  16.0000        S  Second  woman  NaN   \n",
       "41                 0  female  27.0  21.0000        S  Second  woman  NaN   \n",
       "43                 1  female   3.0  41.5792        C  Second  child  NaN   \n",
       "53                 1  female  29.0  26.0000        S  Second  woman  NaN   \n",
       "\n",
       "attributes  embark_town  \n",
       "9             Cherbourg  \n",
       "15          Southampton  \n",
       "41          Southampton  \n",
       "43            Cherbourg  \n",
       "53          Southampton  "
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class.get_group(('female', 'Second')).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The GroubBy object has a number of aggregation methods that will then compute summary statistics over the group members, e.g.:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>class</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td rowspan=\"3\" valign=\"top\">female</td>\n",
       "      <td>First</td>\n",
       "      <td>94</td>\n",
       "      <td>85</td>\n",
       "      <td>94</td>\n",
       "      <td>92</td>\n",
       "      <td>94</td>\n",
       "      <td>81</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Second</td>\n",
       "      <td>76</td>\n",
       "      <td>74</td>\n",
       "      <td>76</td>\n",
       "      <td>76</td>\n",
       "      <td>76</td>\n",
       "      <td>10</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Third</td>\n",
       "      <td>144</td>\n",
       "      <td>102</td>\n",
       "      <td>144</td>\n",
       "      <td>144</td>\n",
       "      <td>144</td>\n",
       "      <td>6</td>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td rowspan=\"3\" valign=\"top\">male</td>\n",
       "      <td>First</td>\n",
       "      <td>122</td>\n",
       "      <td>101</td>\n",
       "      <td>122</td>\n",
       "      <td>122</td>\n",
       "      <td>122</td>\n",
       "      <td>94</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Second</td>\n",
       "      <td>108</td>\n",
       "      <td>99</td>\n",
       "      <td>108</td>\n",
       "      <td>108</td>\n",
       "      <td>108</td>\n",
       "      <td>6</td>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Third</td>\n",
       "      <td>347</td>\n",
       "      <td>253</td>\n",
       "      <td>347</td>\n",
       "      <td>347</td>\n",
       "      <td>347</td>\n",
       "      <td>6</td>\n",
       "      <td>347</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes     survived  age  fare  embarked  who  deck  embark_town\n",
       "sex    class                                                        \n",
       "female First         94   85    94        92   94    81           92\n",
       "       Second        76   74    76        76   76    10           76\n",
       "       Third        144  102   144       144  144     6          144\n",
       "male   First        122  101   122       122  122    94          122\n",
       "       Second       108   99   108       108  108     6          108\n",
       "       Third        347  253   347       347  347     6          347"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Why Kate Winslett survived and Leonardo DiCaprio didn't"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th>class</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td rowspan=\"3\" valign=\"top\">female</td>\n",
       "      <td>First</td>\n",
       "      <td>0.968085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Second</td>\n",
       "      <td>0.921053</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Third</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td rowspan=\"3\" valign=\"top\">male</td>\n",
       "      <td>First</td>\n",
       "      <td>0.368852</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Second</td>\n",
       "      <td>0.157407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Third</td>\n",
       "      <td>0.135447</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes     survived\n",
       "sex    class           \n",
       "female First   0.968085\n",
       "       Second  0.921053\n",
       "       Third   0.500000\n",
       "male   First   0.368852\n",
       "       Second  0.157407\n",
       "       Third   0.135447"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class.mean()[['survived']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of the females who were in first class, count the number from each embarking town"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>attributes</th>\n",
       "      <th>survived</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>deck</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>embark_town</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>43</td>\n",
       "      <td>43</td>\n",
       "      <td>38</td>\n",
       "      <td>43</td>\n",
       "      <td>43</td>\n",
       "      <td>43</td>\n",
       "      <td>43</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Queenstown</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Southampton</td>\n",
       "      <td>48</td>\n",
       "      <td>48</td>\n",
       "      <td>44</td>\n",
       "      <td>48</td>\n",
       "      <td>48</td>\n",
       "      <td>48</td>\n",
       "      <td>48</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "attributes   survived  sex  age  fare  embarked  class  who  deck\n",
       "embark_town                                                      \n",
       "Cherbourg          43   43   38    43        43     43   43    35\n",
       "Queenstown          1    1    1     1         1      1    1     1\n",
       "Southampton        48   48   44    48        48     48   48    43"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class.get_group(('female', 'First')).groupby('embark_town').count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since `count` counts non-missing data, we're really interested in the maximum value for each row, which we can obtain directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "embark_town\n",
       "Cherbourg      43\n",
       "Queenstown      1\n",
       "Southampton    48\n",
       "dtype: int64"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sex_class.get_group(('female', 'First')).groupby('embark_town').count().max('columns')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Cross-tabulation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>class</th>\n",
       "      <th>First</th>\n",
       "      <th>Second</th>\n",
       "      <th>Third</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>survived</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>80</td>\n",
       "      <td>97</td>\n",
       "      <td>372</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>136</td>\n",
       "      <td>87</td>\n",
       "      <td>119</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "class     First  Second  Third\n",
       "survived                      \n",
       "0            80      97    372\n",
       "1           136      87    119"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(titanic.survived, titanic['class'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We can also get multiple summaries at the same time\n",
    "\n",
    "The `agg` method is the most flexible, as it allows us to specify directly which functions we want to call, and where:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [],
   "source": [
    "def my_func(x):\n",
    "    return np.max(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>embarked</th>\n",
       "      <th colspan=\"3\" halign=\"left\">age</th>\n",
       "      <th>survived</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>median</th>\n",
       "      <th>my_func</th>\n",
       "      <th>sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>embark_town</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>43</td>\n",
       "      <td>36.052632</td>\n",
       "      <td>37.0</td>\n",
       "      <td>60.0</td>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Queenstown</td>\n",
       "      <td>1</td>\n",
       "      <td>33.000000</td>\n",
       "      <td>33.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Southampton</td>\n",
       "      <td>48</td>\n",
       "      <td>32.704545</td>\n",
       "      <td>33.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>46</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            embarked        age                survived\n",
       "               count       mean median my_func      sum\n",
       "embark_town                                            \n",
       "Cherbourg         43  36.052632   37.0    60.0       42\n",
       "Queenstown         1  33.000000   33.0    33.0        1\n",
       "Southampton       48  32.704545   33.0    63.0       46"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mapped_funcs = {'embarked': 'count', \n",
    "                'age': ('mean', 'median', my_func), \n",
    "                'survived': sum}\n",
    "\n",
    "sex_class.get_group(('female', 'First')).groupby('embark_town').agg(mapped_funcs)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
